-
Notifications
You must be signed in to change notification settings - Fork 112
Python array API support #1683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Python array API support #1683
Conversation
ab70b6f to
c698bfa
Compare
|
Would it be a lot of work to put the weighting hierarchy in its own PR? I think there are a bunch of questions to be discussed there, and it should be checked thoroughly whether this really works, independent of the Python Array API generalizations. What certainly needs clarification (if not changes) is that about the methods being "mutually exclusive". It makes some sense from the definition side (we don't want conflicting semantics between e.g. distance and norm). But surely, from the use side we should then have all of them available, as far as mathematically possible? In the most typical use case (say, |
|
About the fact that the weighting refactoring should be in its own PR. I chose to add it here as for me, the python array API allows decoupling the backend classes (which will only add a few attributes to the Weighting) and the abstract class. As i am doing it for the TensorSpace and TensorSpaceElement classes i thought it would fit :) |
|
I did not mean to click "ready for review" |
Well, for one thing, I don't think the current state actually works correctly for a simple inner-product space. In the |
|
Pretty sure all that could be fixed, but the thing is, it's not so clear-cut to say the functionality remains unchanged (and whether that even makes sense) given that the decision logic is set up in a different way now, with a dict-of-methods instead of inheritance. And this would be best investigated in a separate PR. |
|
Okay, got you :) I should have checked the behaviour better. However i don't really understand the point about splitting the PRs. I looked at your pytorch backend PR: the commits are mixed and i think it's still readable. I am also not sure that you ran the tests that would have spotted the errors that we all do while coding. I told you this is a WIP, you told me to make a PR and now you say that it's not thoroughly checked. I guess we should align in terms of how we push changes to this repo. How about discussing live on Friday? |
|
No critique meant. We're moving in the right direction, I just keep underestimating the amount of individual changes and possible implications. The more we separate concerns, the better we can ensure that each change is safe and won't cause more problems further down the road than it fixes. |
odl/space/weighting.py
Outdated
| if is_real_dtype(x2.dtype): | ||
| return np.vecdot(x1.ravel(), x2.ravel()) | ||
| else: | ||
| # This could also be done with `np.vdot`, which has complex conjugation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now out of date. Fixed in d9575e8
odl/test/space/space_utils_test.py
Outdated
| assert all_equal(x, ['1', '2', 'inf']) | ||
| with pytest.raises(AssertionError): | ||
| x = vector(inp) | ||
| # assert isinstance(x, NumpyTensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense for us to support string dtypes at all. Leaving those commented assertions makes it unclear what's the intention.
odl/space/entry_points.py
Outdated
| TENSOR_SPACE_IMPLS = { | ||
| 'numpy': NumpyTensorSpace | ||
| } | ||
| AVAILABLE_DEVICES = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't really be needed now that available_devices is in ArrayBackend, right?
I think it would be better if everything that needs to know what devices are available for an impl goes through the corresponding ArrayBackend object instead of fiddling with another global dictionary (which would have to be kept in sync when new backends are added).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the devices must be only accessed through the backend. I used this variable to build the IMPL_DEVICE_PAIRS fixture as we loaded the backends but that's not clear, actually. I will move that in the pytest_config instead :)
odl/trafos/util/ft_utils.py
Outdated
| is_complex_floating_dtype, is_numeric_dtype, is_real_dtype, | ||
| is_real_floating_dtype, is_string, normalized_axes_tuple, | ||
| is_complex_dtype, is_numeric_dtype, is_real_dtype, | ||
| is_floating_dtype, is_string, normalized_axes_tuple, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like this change of naming. After all, complex numbers (at least the standard complex64 and complex128 are also based on floating point, so it's somewhat misleading to exclude them in something called is_floating_dtype.
Why not just leave those names as they are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things to make you reconsider :-)
- as per NumPy's documentation, "There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point (float) and complex." https://numpy.org/doc/stable/user/basics.types.html in Numerical data types.
- Removing the check for complex inside the is_floating_type did not affect the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per NumPy's documentation, "There are 5 basic numerical types..."
but those are concrete fixed types (up to bit width). The is_XYZ_dtype don't refer to one particular type-flavour, but each discerns a whole bunch of types. And in the same way both int and float are numeric types (meaning for practical purposes that they support + and * and -), both float and complex are floating types, in the sense that they support / and sqrt etc.. So it might make sense to have also an is_floating_dtype which would include both float and complex, as there are many things that should work for both of those. But at any rate we should still have is_real_floating_dtype that specifically only has float32 and float64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay with that!
Making sure that we are consistent with the use of the ArrayBackend class
I suppose you mean Well, we certainly want |
|
Sure, deprecating is not the right word, you know what i mean though. perform arithmetic operations between product spaces and lists so not when |
Ah, like I'm not sure if it should be supported. I tend to say no, it would be better to require it to be written |
| from .utils import get_array_and_backend | ||
| from numbers import Number | ||
| import numpy as np | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this module? What is the intended use case? What are the semantics of the functions, and are they sensible?
I feel like there is kind of a wheel that keeps being reinvented in ODL, first with the various NumPy-based tricks, then with ArrayOnBackendManager, then access to the Array API, finally the ArrayBackend class.
As soon as you have an array_namespace, you can just look up the functions from that. That's certainly not something the user should need to do for basic elementwise stuff etc., but on the other hand functions such as empty_like are pretty low-level (in the ODL perspective). When on that level, how much do we really win by using functions from yet another module, with yet another semantics to get used to, as opposed to looking into the appropriate array_namespace directly at the use site?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider x. We don't know if it's an np.ndarray, a torch.Tensor or a LinearSpaceElement.
We need to have the following machinery:
import odl
x_arr, backend = get_array_and_backend(x)
zeros_like = backend.array_namespace.zeros_like(x_arr)
With the array_creation module, we can have
import odl
zeros_like = odl.zeros_like(like_x_arr)
I think that the get_array_and_backend mechanism is a bit verbose for the user (but perfectly fine for us) and, most importantly, does not provide the documentation of the function in the IDE. I know you are a not a user of such features but i personnaly find it more user friendly.
In my opinion, our users overwhelmingly think in terms of arrays (just by personal experience+seeing the commits of non-core developpers) and will expect ODL to be lenient when it comes to the functions they are used to.
|
Ah, now I get it! Yes, |
We replaced it by looking up the dtype_utils dicts.
@leftaroundabout figured out that python actually follows an interesting pattern for doing the inplace updates. It turns out that our previous implementation created a copy for all the __inplace__ operations. This was due to the fact that x+=y does not just call x.__iadd__(y) but actually uses x = x.__iadd__(y). Previously, iadd returneda copy of the result, and thus the assignment caused id(x) to change.
…he array-API support.
1) Removed the from_dlpack function of the odl namespace. This is because the api is quite unstable at time of development. 2) Edited the some tests.
…rward and backward direction.
…python_array_api_support
That is, avoid using numpy.asarray to extract plain arrays for ODL objects.
Moving the docstrings to the new one.
…dl` main namespace.
This is mostly obsolete, but parts of it could generalize across array backends.
…g the values to plot back to numpy by default.
…, when it's merely not installed on a user's machine.
…nly gives a NumPy array if `impl=numpy`.
1) Removed the old doc folder 2) Moved Justus' new guides in the docs 3) Modified the readthedocs.yaml file by removing the ambiguous --output tag 4) Removed unnecessary requirements from the requirements.txt 5) Added the right path in the generate_doc 6) Modified the year in the conf.py
The aim of this pull request is to make ODL multi backend through the Python Array API.